Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features

نویسندگان

  • Martijn Wieling
  • John Nerbonne
چکیده

In this study we use bipartite spectral graph partitioning to simultaneously cluster varieties and identify their most distinctive linguistic features in Dutch dialect data. While clustering geographical varieties with respect to their features, e.g. pronunciation, is not new, the simultaneous identification of the features which give rise to the geographical clustering presents novel opportunities in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant features which co-vary with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual features which are strongly associated, even while seeking groups of sites which share subsets of these same features. We show that the application of this method results in clear and sensible geographical groupings and discuss and analyze the importance of the concomitant features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features

In this study we apply a hierarchical bipartite spectral graph partitioning method to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectol...

متن کامل

Hierarchical Spectral Partitioning of Bipartite Graphs to Cluster Dialects and Identify Distinguishing Features

In this study we apply hierarchical spectral partitioning of bipartite graphs to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectology. ...

متن کامل

Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology

In this study we used bipartite spectral graph partitioning to simultaneously cluster varieties and sound correspondences in Dutch dialect data. While clustering geographical varieties with respect to their pronunciation is not new, the simultaneous identification of the sound correspondences giving rise to the geographical clustering presents a novel opportunity in dialectometry. Earlier metho...

متن کامل

Patterns of language variation and underlying linguistic features: a new dialectometric approach

For almost forty years quantitative methods have been applied to the analysis of dialect variation: these methods focused mostly on identifying the most important dialectal groups using an aggregate analysis of the linguistic data (Séguy 1973; Goebl 1984; Nerbonne et al. 1999). While viewing dialect differences at an aggregate level certainly gives a more comprehensive view than the analysis of...

متن کامل

Analyzing phonetic variation in the traditional English dialects: Simultaneously clustering dialects and phonetic features

This study explores the linguistic application of bipartite spectral graph partitioning, a graphtheoretic technique that simultaneously identifies clusters of similar localities as well as clusters of features characteristic of those localities. We compare the results using this approach to previously published results on the same dataset using cluster and principal component analysis (Shacklet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2011